violent completion
Muslim-Violence Bias Persists in Debiased GPT Models
Hemmatian, Babak, Baltaji, Razan, Varshney, Lav R.
Abid et al. (2021) showed a tendency in GPT-3 to generate mostly violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and only a weak anti-Muslim bias in the more recent InstructGPT, fine-tuned to eliminate biased and toxic outputs. However, more pre-registered experiments showed that using common names associated with the religions in prompts increases several-fold the rate of violent completions, revealing a significant second-order anti-Muslim bias. ChatGPT showed a bias many times stronger regardless of prompt format, suggesting that the effects of debiasing were reduced with continued model development. Our content analysis revealed religion-specific themes containing offensive stereotypes across all experiments. Our results show the need for continual de-biasing of models in ways that address both explicit and higher-order associations.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Illinois > Champaign County > Urbana (0.05)
Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts
Hemmatian, Babak, Varshney, Lav R.
Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.
- North America > United States > Illinois > Champaign County > Urbana (0.05)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.05)
- North America > United States > New Mexico (0.04)
- (3 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
- Media > News (0.47)